Search Results: "matrix"

30 September 2021

Dirk Eddelbuettel: RcppArmadillo 0.10.7.0.0 on CRAN: New Upstream

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 912 other packages on CRAN. This new release brings us Armadillo 10.7.0 released this morning by Conrad. Leading up to this were three runs of reverse dependencies the first of which uncovered the need for a small PR for subview_cols support which Conrad kindly supplied. The full set of changes follows. We include the last interim release (sent as usual to the drat repo) as well.

Changes in RcppArmadillo version 0.10.7.0.0 (2021-09-30)
  • Upgraded to Armadillo release 10.7.0 (Entropy Maximizer)
    • faster handling of submatrix views accessed by X.cols(first_col,last_col)
    • faster handling of element-wise min() and max() in compound expressions
    • expanded solve() with solve_opts::force_approx option to force use of the approximate solver

Changes in RcppArmadillo version 0.10.6.2.0 (2021-08-05)
  • Upgraded to Armadillo release 10.6.2 (Keep Calm)
    • fix incorrect use of constexpr for handling fixed-size matrices and vectors
    • improved documentation
  • GitHub- and drat-only release

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

14 September 2021

Debian Social Team: Some site updates

We re in the process of upgrading to Debian 11 (bullseye). If you come across any issues, feel free to raise them on the -social IRC channel on oftc (also accessible via Matrix) and we ll look into it as soon as we have a chance. We re aware that live streaming isn t currently working on our PeerTube instance, we have had some issues with this relatively new feature before, although at the moment we seem to be affected by upstream issue #4390.

9 September 2021

Debian Social Team: Matrix Synapse updated and new plumbed IRC rooms

Matrix synapse was updated to 1.40.0, during the upgrade the server was upgraded to Bullseye. The following Matrix rooms were plumbed to IRC:

1 September 2021

Paul Wise: FLOSS Activities August 2021

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration
  • Debian servers: expand LV, fix debbugs config
  • Debian wiki: unblock IP addresses, approve accounts
  • Debian QA services: deploy changes

Communication

Sponsors The pyemd, pytest-rerunfailures, libpst, sptag, librecaptcha work was sponsored by my employer. All other work was done on a volunteer basis.

23 August 2021

Leandro Doctors: Clojure CLI Tools in Debian - GSoC 2021 Final Report

NOTE: this blog post is based on my "Clojure CLI Tools in Debian" GSoC 2021 project Final Report.

IntroMy name is Leandro Doctors ( allentiak on IRC), and I ve been the GSoC intern working with the Debian Clojure Team during 2021. This is my final report. You can also check my original proposal and my first report.

SummaryWhereas the raw data may not sound by itself very positive, my personal conclusion is. This is, whereas I didn t fully finish the required deliverables envisioned in my original proposal, I do feel I am much closer to, eventually, becoming a Debian Developer. So, by all means, I consider this project has had a positive outcome.

ProjectThe goal of the Clojure Build Tools in Debian project was to provide Clojure Debian users with some of the latest advanced build tools and libraries the upstream Clojure developers have been lately working on. These include tools.deps.alpha, a library for dependency graph resolution and classpath building, and the CLI tool clj, for REPL interaction. If time permitted, I was also to improve the quality of both new and existing Clojure packages, and the overall Debian Clojure packaging process. My mentor was Louis-Louis-Philippe V ronneau, and my co-mentor was Utkarsh Gupta.

MotivationWhy this project? On the one side, if you re a Clojure lover like me, you may have noticed that the Clojure experience in Debian is, as of mid 2021, well... still quiet limited. Additionally, this project aligned with my own background in Free Software community building and my research interest in Peer Production.When I mention how limited today s Clojure experience in Debian is, I can see two reasons for this, deeply intertwined. The first one is that there currently aren t many Clojure-specific packaging tools in Debian (such as a clojure-debian-helper). The second reason for which we only currently have a suboptimal Clojure experience in Debian, and probably the root of the previous one, is that many core build tools and libraries for the language have not simply been packaged yet. My project aimed to attack that seemingly root cause.As I said, another reason for me choosing this project is my own experience as the Co-founder and Leader of, probably, the first Free Software Community experience in my hometown of San Juan, Argentina. That interest in Free Software evolved in a first PhD attempt in what is now known as the field of Peer Production. A subject that has lived within me as a research interest during my day job at a University.Being a Clojure fan, it felt only logical combining all those interests somehow. And this project seemed like the ideal combination.

The Debian Clojure TeamI ve been working with a small, yet very warm team. The current incarnation of the Debian Clojure Team exists thanks to the hard work of three people.Elana Hashman (aka the Clojure necromancer ), revived the team around three years ago. Later on, the team gained the invaluable presence of Louis-Philippe V ronneau and Utkarsh Gupta (my mentor and co-mentor, respectively).Together, these Three Musketeers have maintained the team alive, allowing us, Debian users, to enjoy Clojure.

StatusDuring the first part of my project, I mainly worked on learning the basics of Debian packaging, and got my first package uploaded. I have to thank Louis-Philippe, Utkarsh, and Elana for their immense patience and support during that part, as it took me quite some effort grasping the basics of Debian packaging.During the second part of my project, I worked on my last packages, and almost completed the originally required scope of the project. I only have to finish working on the transition from the currently provided set of packages (based on a Debian-specific clojure runner) to the newly provided upstream clojure and clj runners.Unfortunately, I didn t have much time left to start working on the opportunities for improvement already identified by the Debian Clojure Team originally outlined in my proposal. Whereas I did update one older Clojure package not built using leiningen (tools-data-xml-clojure), I didn t write any Lintian tags to make Clojure packaging in Debian more robust, nor worked towards the automation of Clojure unit tests in autopkgtests via autodep8.

Deliverables: Data vs. ConclusionsIf we are to talk about deliverables, we should start with the data. According to my original proposal, I was required to provide both new and updated Clojure packages accepted into Debian unstable , and updated Clojure packaging documentation. Additionally, if time permitted, I was to also provide new Clojure Lintian tags merged by the Lintian maintainers, and new Clojure autodep8 scripts merged by the autodep8 maintainers. Whereas I partially accomplished both required tasks, I didn t manage to start working on any of the optional deliverables.When looked in isolation, those numbers may look somewhat disappointing for some people. However, I can draw a much more positive conclusion. Why?Firstly, GSoC is supposed to be a learning experience. Moreover, as I said in my original proposal, I approach[ed] this project as a great opportunity to, finally, start my journey towards becoming a Debian Developer . In that sense, I consider the time invested into this project fruitful. In this way, I have learned the basics of packaging, how to interact with the Debian Clojure Team, and and already got my first packages accepted. Plus, I m looking forward to continuing to work with the Debian Clojure Team so I can attain the original scope of the project. Therefore, all things considered, I can consider this experience as a moderate success.

Lessons LearnedTechnically speaking, if I have learned one thing during these weeks, is that packaging, although easy to be underestimated, is by no means a trivial process. As any Debian Developer surely knows, the onboarding process can take some time. Plus, what is easy for some people, can be difficult for others. In my case, this was quite evident. Whereas I can speak several languages and learning new ones takes me little effort, grasping the basics of packaging took me (literally!) blood, sweat and tears. Indeed, the packaging learning curve was quite steep for me.That being said, I did learn a thing or two about packaging. So, if I managed to get here, I m sure many others can. It may take them more or less time than what it took me, but learning (at least the basics) of packaging is an achievable goal.Technical skill learning aside, I value very highly the non-technical skills I have so far improved during this project.For instance, I also learned that it can take some time to adapt to real-time online communication. Before this project, remote working meant either exchanging emails or getting into video or audio calls, with a low emphasis on chat-based interaction. Early on, I realized that the Debian Clojure Team interacts almost exclusively via, well... chat! And those two approaches are very different indeed. It has taken me some time to adjust, but I ve improved greatly in this aspect as well.Finally, improving my time management skills has been also a key part of this process. Whereas I had already been working remotely for over a year and a half already, my day job is not so interaction-dependent as this project (specially in the beginning). So it took me some time to adapt to this way of working, and to plan my workload so I could use those waiting moments to advance in other parts of the project. Still a lot to improve here, but improving nevertheless.

AcknowledgmentsI first have to thank upstream. More specifically, one of the upstream developers of the clojure-tools, Alex Miller. Everytime I needed specific information on what do specific parts of the Clojure CLI tools s codebase do, tools.deps.alpha do, he popped up a reply in a matter of hours. He has shown genuine interest in the success of is project during by carefully replying to my emails with detailed explanations of code intent and form, both in private and in public conversations. Thank you for all that, Alex!Let s move on to the Debian Clojure Team.First, Elana. I thank Elana for her initial openness when I first contacted her about this idea. It was *her* who initially contacted Louis-Philippe so he would become my mentor. I wouldn t have started to work on this project if it wasn t for her. Plus, she provided quite a piece of advise in more that one ocassion. So, thank you very much for all that, Elana!I also thank Utkarsh, my co-mentor, for his overall technical advise. And a special mention to his initial help to setup my Matrix client for OFTC chat. At that moment, it was *him* who took the time to help me in real time so I could solve that problem. So, thank you very much for all that, Utkarsh!I finally have to thank Louis-Philippe, my mentor, for his patient guidance during the whole process. His dedication and hard work has been *instrumental* for my progress. And a special mention for his tolerance with respect to some unforeseen personal circumstances I had to endure during the first weeks. When one is playing the newbie, times abound when one depends on other people s feedback. And Debian is made of volunteers, who have a life outside it. Every time I asked, Louis-Philippe was there. I wouldn t have gotten here if it wasn t for him. So, thank you *so* much for all that, Louis-Philippe!

Final WordsI would like to close this report with a reflection.I have been using Debian for many, many years now, and I had been looking for a way to contribute back to the project for some time already. I even did some work on a non-packaging Debian project. That being said, I never managed to deliver much, really.So, the very existence of outreach programs as this one is, in my humble opinion, crucial. In my case, the funding I got through the GSoC program was instrumental in being able to allocate time for this endeavor, and to finally get started contributing to Debian. Plus, it has had a very positive impact on me; in many ways, some of which I am only starting to discover now that the project is ending.When I put things into perspective, this project is very important for me. Actually, it is nothing but the first step within a long-term journey: becoming a Debian member. Hopefully, I would like to be able to apply for Debian membership by the end of this year.

Questions?Thank you very much for your time reading this! I look forward to hearing (or reading) your feedback. Please come and meet with the Debian Clojure Team Moreover, I will be in the Clojure BoF on DebConf2021. Moreover, do not hesitate to send me an email.

Data

Task Status
  • Required Tasks:
    • T1: Setting up a full Debian packaging development environment and learning the basics of Debian packaging.
      • Successfully completed the first part during the Application period.
      • Successfully completed the second part during the Coding periods.
    • T2: Identifying and packaging the missing dependencies to package clojure-cli.
      • Successfully completed as of the end of Coding II.
    • T3: Packaging clojure-cli.
      • 90% done as of the end of Coding II.
    • T4: Updating clojure to use clojure-cli.
      • To be completed after GSoC.
    • T5: Updating the Clojure Packaging Guide with information on how to use the new clojure-cli scripts.
      • Improved existing documentation. To be completed after GSoC.
  • Optional Tasks:
    • T6: Writing Lintian tags to make Clojure packaging in Debian more robust.
      • To be completed after GSoC.
    • T7: Working to automate Clojure unit testing in autopkgtests using autodep8.
      • To be completed after GSoC.
    • T8: Updating older Clojure packages not built using leiningen or clojure-cli.
      • To be completed after GSoC.

Packages
  1. Updated packages:
  2. New packages:
    • tools-gitlibs-clojure -- Clojure API for programatically accessing git libraries. ITP: #905543 in NEW.
    • ITP: tools-deps-alpha-clojure -- functional API for dependency management and classpath creation https://bugs.debian.org/891136 Needs to be uploaded by Louis-Philippe.
  3. In-Progress packages:
    • ITP: clojure-cli -- upstream CLI entrypoints for Clojure https://bugs.debian.org/891141 90% done - Package completed. I only need to finish implementing the transition from existing clojure scripts. To be completed after GSoC.
  4. Opened ITPs:
  5. Reported bugs

Other
  1. Interactions with upstream in the Debian-Clojure mailing list:
    • Many productive interactions with one of the upstream developers, Alex Miller (June, August).
  2. Wiki page:
    1. https://wiki.debian.org/Clojure/Goals/ClojureCLIToolsInDebian

15 August 2021

Mike Gabriel: Chromium Policies Managed under Linux

For a customer project, I recently needed to take a closer look at best strategies of deploying Chromium settings to thrillions of client machines in a corporate network. Unfortunately, the information on how to deploy site-wide Chromium browser policies are a little scattered over the internet and the intertwining of Chromium preferences and Chromium policies required deeper introspection. Here, I'd like to provide the result of that research, namely a list of references that has been studied before setting up Chromium policies for the customer's proof-of-concept. Difference between Preferences and Policies Chromium can be controlled via preferences (mainly user preferences) and administratively rolled-out policy files. The difference between preferences and policies are explained here:
https://www.chromium.org/administrators/configuring-other-preferences The site-admin (or distro package maintainer) can pre-configure the user's Chromium experience via a master preferences file (/etc/chromium/master_preferences). This master preferences file is the template for the user's preferences file and gets copied over into the Chromium user profile folder on first browser start. Note: By studying the recent Chromium code it was found out that /etc/chromium/master_preferences is the legacy filename of the initial preferences file. The new filename is /etc/chromium/initial_preferences. We will continue with master_preferences here as most Linux distributions still provide the initial preferences via this file. Whereas the new filename is already supported by Chromium in openSUSE/SLES, it is not yet support by Chromium in Debian/Ubuntu. (See Debian bug #992178). Difference of 'managed' and 'recommended' Policies The difference between 'managed' and 'recommended' Chromium policies is explained here:
https://www.chromium.org/administrators/configuring-other-preferences Quoting from above URL (last visited 2021/08): Policies that should be editable by the user are called "recommended policies" and offer a better alternative than the master_preferences file. Their contents can be changed and are respected as long as the user has not modified the value of that preference themselves. So, policies of type 'managed' override user preferences (and also lock them in the Chromium settings UI). Those 'managed' policies are good for enforcing browser settings. They can be blended in also for existing browser user profiles. Policies ('managed' and 'recommended') even get blended it at browser run-time when modified. Use case: e.g. for rolling out browser security settings that are required for enforcing a site-policy-compliant browser user configuration. Policies of type 'recommended' have an impact on setting defaults of the Chromium browser. They apply to already existing browser profiles, if the user hasn't tweaked with the to-be-recommended settings, yet. Also, they get applied at browser run-time. However, if the user has already fiddled with such a to-be-recommended setting via the Chromium settings UI, the user choice takes precedence over the recommended policy. Use case: Policies of type 'recommended' are good for long-term adjustments to browser configuration options. Esp. if users don't touch their browser settings much, 'recommended' policies are a good approach for fine-tuning site-wide browser settings on user machines. CAVEAT: While researching on this topic, two problematic observations were made:
  1. All setting parameters put into the master preferences file (/etc/chromium/master_preferences) can't be superceded by 'recommended' Chromium policies. Pre-configured preferences are handled as if the user has already tinkered with those preferences in Chromium's settings UI. It also was discovered, that distributors tend to overload /etc/chromium/master_preferences with their best practice browser settings. Everything that is not required on first browser start should be provided as 'recommended' policies, already in the distribution packages for Chromium .
  2. There does not seem to be an elegant way to override the package maintainer's choice of options in /etc/chromium/master_preferences file via some file drop-in replacement. (See Debian bug #992179). So, deploying Chromium involves post-install config file tinkering by hand, by script or by config management tools. There is space for improvement here.
Managing Chromium Policy with Files Chromium supports 'managed' policies and 'recommended' policies. Policies get deployed as JSON files. For Linux, this is explained here:
https://www.chromium.org/administrators/linux-quick-start Note, that for Chromium, the policy files have to be placed into /etc/chromium. The example on the above web page shows where to place them for Google Chrome. Good 'How to Get Started' Documentation for Chromium Policy Setups This overview page provides a good get-started-documentation on how to provision Chromium via policies:
https://www.chromium.org/administrators/configuring-policy-for-extensions First-Run Preferences It seems, not every setting can be tweaked via a Chromium policy. Esp. the first-run preferences are affected by this:
https://www.chromium.org/developers/design-documents/first-run-customiza... So, for tweaking the first-run settings, one needs to adjust /etc/chromium/master_prefences (which is suboptimal, again see Debian bug #992179 for a detailed explanation on why this is suboptimal). The required adjustments to master_preferences can be achieved with the jq command line tool, here is one example:
# Tweak chromium's /etc/chromium/master_preferences file.
# First change: drop everything that can be provisioned via Chromium Policies.
# Rest of the changes: Adjust preferences for new users to our needs for all
# parameters that cannot be provisioned via Chromium Policies.
cat /etc/chromium/master_preferences   \
    jq 'del(.browser.show_home_button, .browser.check_default_browser, .homepage)'  
    jq '.first_run_tabs=[ "https://first-run.example.com/", "https://your-admin-faq.example.com" ]'  
    jq '.default_apps="noinstall"'  
    jq '.credentials_enable_service=false   .credentials_enable_autosignin=false'  
    jq '.search.suggest_enabled=false'  
    jq '.distribution.import_bookmarks=false   .distribution.verbose_logging=false   .distribution.skip_first_run_ui=true'  
    jq '.distribution.create_all_shortcuts=true   .distribution.suppress_first_run_default_browser_prompt=true'  
    cat > /etc/chromium/master_preferences.adapted
if [ -n "/etc/chromium/master_preferences.adapted" ]; then
        mv /etc/chromium/master_preferences.adapted /etc/chromium/master_preferences
else
        echo "WARNING (chromium tweaks): The file /etc/chromium/master_preferences.adapted was empty after tweaking."
        echo "                           Leaving /etc/chromium/master_preferences untouched."
fi
The list of available (first-run and other) initial preferences can be found in Chromium's pref_names.cc code file:
https://github.com/chromium/chromium/blob/main/chrome/common/pref_names.cc List of Available Chromium Policies The list of available Chromium policies used to be maintained in the Chromium wiki: https://www.chromium.org/administrators/policy-list-3 However, that page these days redirects to the Google Chrome Enterprise documentation:
https://chromeenterprise.google/policies/ Each policy variable has its own documentation page there. Please note the "Supported Features" section for each policy item. There, you can see, if the policy supports being placed into "recommended" and/or "managed". This is an example /etc/chromium/policies/managed/50_browser-security.json file (note that all kinds of filenames are allowed, even files without .json suffix):
 
  "HideWebStoreIcon": true,
  "DefaultBrowserSettingEnabled": false,
  "AlternateErrorPagesEnabled": false,
  "AutofillAddressEnabled": false,
  "AutofillCreditCardEnabled": false,
  "NetworkPredictionOptions": 2,
  "SafeBrowsingProtectionLevel": 0,
  "PaymentMethodQueryEnabled": false,
  "BrowserSignin": false,
 
And this is an example /etc/chromium/policies/recommended/50_homepage.json file:
 
  "ShowHomeButton": true,
  "WelcomePageOnOSUpgradeEnabled": false,
  "HomepageLocation": "https://www.example.com"
 
And for defining a custom search provider, I use /etc/chromium/policies/recommended/60_searchprovider.json (here, I recommend not using DuckDuckGo as DefaultSearchProviderName, but some custom name; unfortunately, I did not find a policy parameter that simply selects an already existing search provider name as the default :-( ):
 
  "DefaultSearchProviderEnabled": true,
  "DefaultSearchProviderName": "DuckDuckGo used by Example.com",
  "DefaultSearchProviderIconURL": "https://duckduckgo.com/favicon.ico",
  "DefaultSearchProviderEncodings": ["UTF-8"],
  "DefaultSearchProviderSearchURL": "https://duckduckgo.com/?q= searchTerms ",
  "DefaultSearchProviderSuggestURL": "https://duckduckgo.com/ac/?q= searchTerms &type=list",
  "DefaultSearchProviderNewTabURL": "https://duckduckgo.com/chrome_newtab"
 
The Essence and Recommendations On first startup, Chromium copies /etc/chromium/master_preferences to $HOME/.config/chromium/Default/Preferences. It does this only if the Chromium user profile has'nt been created, yet. So, settings put into master_preferences by the distro and the site or device admin are one-time-shot preferences (new user logs into a device, preferences get applied on first start of Chromium). Chromium policy files, however, get continuously applied at browser runtime. Chromium watches its policy files and you can observe Chromium settings change when policy files get modified. So, for continuously provisioning site-wide settings that mostly always trickle into the user's browser configuration, Chromium policies should definitely be preferred over master_preferences and this should be the approach to take. When using Chromium policies, one needs to take into account that settings in /etc/chromium/master_preferences seem to have precedence over 'recommended' policies. So, settings that you want to deploy as recommended policies must be removed from /etc/chromium/master_preferences. Essentially, these are the recommendations extracted from all the above research and information for deploying Chromium on enterprise scale:
  1. Everything that's required at first-run should go into /etc/chromium/master_preferences.
  2. Everything that's not required at first-run should be removed from /etc/chromium/master_preferences.
  3. Everything that's deployable as a Chromium policy should be deployed as a policy (as you can influence existing browser sessions with that, also long-term)
  4. Chromium policy files should be split up into several files. Chromium parses those files in alpha-numerical order. If policies occur more than once, the last policy being parsed takes precedence.
Feedback If you have any feedback or input on this post, I'd be happy to hear it. Please get in touch via the various channels where I am known as sunweaver (OFTC and libera.chat IRC, [matrix], Mastodon, E-Mail at debian.org, etc.). Looking forward to hearing from you. Thanks! light+love
Mike Gabriel (aka sunweaver)

2 August 2021

Colin Watson: Launchpad now runs on Python 3!

After a very long porting journey, Launchpad is finally running on Python 3 across all of our systems. I wanted to take a bit of time to reflect on why my emotional responses to this port differ so much from those of some others who ve done large ports, such as the Mercurial maintainers. It s hard to deny that we ve had to burn a lot of time on this, which I m sure has had an opportunity cost, and from one point of view it s essentially running to stand still: there is no single compelling feature that we get solely by porting to Python 3, although it s clearly a prerequisite for tidying up old compatibility code and being able to use modern language facilities in the future. And yet, on the whole, I found this a rewarding project and enjoyed doing it. Some of this may be because by inclination I m a maintenance programmer and actually enjoy this sort of thing. My default view tends to be that software version upgrades may be a pain but it s much better to get that pain over with as soon as you can rather than trying to hold back the tide; you can certainly get involved and try to shape where things end up, but rightly or wrongly I can t think of many cases when a righteously indignant user base managed to arrange for the old version to be maintained in perpetuity so that they never had to deal with the new thing (OK, maybe Perl 5 counts here). I think a more compelling difference between Launchpad and Mercurial, though, may be that very few other people really had a vested interest in what Python version Launchpad happened to be running, because it s all server-side code (aside from some client libraries such as launchpadlib, which were ported years ago). As such, we weren t trying to do this with the internet having Strong Opinions at us. We were doing this because it was obviously the only long-term-maintainable path forward, and in more recent times because some of our library dependencies were starting to drop support for Python 2 and so it was obviously going to become a practical problem for us sooner or later; but if we d just stayed on Python 2 forever then fundamentally hardly anyone else would really have cared directly, only maybe about some indirect consequences of that. I don t follow Mercurial development so I may be entirely off-base, but if other people were yelling at me about how late my project was to finish its port, that in itself would make me feel more negatively about the project even if I thought it was a good idea. Having most of the pressure come from ourselves rather than from outside meant that wasn t an issue for us. I m somewhat inclined to think of the process as an extreme version of paying down technical debt. Moving from Python 2.7 to 3.5, as we just did, means skipping over multiple language versions in one go, and if similar changes had been made more gradually it would probably have felt a lot more like the typical dependency update treadmill. I appreciate why not everyone might want to think of it this way: maybe this is just my own rationalization. Reflections on porting to Python 3 I m not going to defend the Python 3 migration process; it was pretty rough in a lot of ways. Nor am I going to spend much effort relitigating it here, as it s already been done to death elsewhere, and as I understand it the core Python developers have got the message loud and clear by now. At a bare minimum, a lot of valuable time was lost early in Python 3 s lifetime hanging on to flag-day-type porting strategies that were impractical for large projects, when it should have been providing for bilingual strategies (code that runs in both Python 2 and 3 for a transitional period) which is where most libraries and most large migrations ended up in practice. For instance, the early advice to library maintainers to maintain two parallel versions or perhaps translate dynamically with 2to3 was entirely impractical in most non-trivial cases and wasn t what most people ended up doing, and yet the idea that 2to3 is all you need still floats around Stack Overflow and the like as a result. (These days, I would probably point people towards something more like Eevee s porting FAQ as somewhere to start.) There are various fairly straightforward things that people often suggest could have been done to smooth the path, and I largely agree: not removing the u'' string prefix only to put it back in 3.3, fewer gratuitous compatibility breaks in the name of tidiness, and so on. But if I had a time machine, the number one thing I would ask to have been done differently would be introducing type annotations in Python 2 before Python 3 branched off. It s true that it s technically possible to do type annotations in Python 2, but the fact that it s a different syntax that would have to be fixed later is offputting, and in practice it wasn t widely used in Python 2 code. To make a significant difference to the ease of porting, annotations would need to have been introduced early enough that lots of Python 2 library code used them so that porting code didn t have to be quite so much of an exercise of manually figuring out the exact nature of string types from context. Launchpad is a complex piece of software that interacts with multiple domains: for example, it deals with a database, HTTP, web page rendering, Debian-format archive publishing, and multiple revision control systems, and there s often overlap between domains. Each of these tends to imply different kinds of string handling. Web page rendering is normally done mainly in Unicode, converting to bytes as late as possible; revision control systems normally want to spend most of their time working with bytes, although the exact details vary; HTTP is of course bytes on the wire, but Python s WSGI interface has some string type subtleties. In practice I found myself thinking about at least four string-like types (that is, things that in a language with a stricter type system I might well want to define as distinct types and restrict conversion between them): bytes, text, ordinary native strings (str in either language, encoded to UTF-8 in Python 2), and native strings with WSGI s encoding rules. Some of these are emergent properties of writing in the intersection of Python 2 and 3, which is effectively a specialized language of its own without coherent official documentation whose users must intuit its behaviour by comparing multiple sources of information, or by referring to unofficial porting guides: not a very satisfactory situation. Fortunately much of the complexity collapses once it becomes possible to write solely in Python 3. Some of the difficulties we ran into are not ones that are typically thought of as Python 2-to-3 porting issues, because they were changed later in Python 3 s development process. For instance, the email module was substantially improved in around the 3.2/3.3 timeframe to handle Python 3 s bytes/text model more correctly, and since Launchpad sends quite a few different kinds of email messages and has some quite picky tests for exactly what it emits, this entailed a lot of work in our email sending code and in our test suite to account for that. (It took me a while to work out whether we should be treating raw email messages as bytes or as text; bytes turned out to work best.) 3.4 made some tweaks to the implementation of quoted-printable encoding that broke a number of our tests in ways that took some effort to fix, because the tests needed to work on both 2.7 and 3.5. The list goes on. I got quite proficient at digging through Python s git history to figure out when and why some particular bit of behaviour had changed. One of the thorniest problems was parsing HTTP form data. We mainly rely on zope.publisher for this, which in turn relied on cgi.FieldStorage; but cgi.FieldStorage is badly broken in some situations on Python 3. Even if that bug were fixed in a more recent version of Python, we can t easily use anything newer than 3.5 for the first stage of our port due to the version of the base OS we re currently running, so it wouldn t help much. In the end I fixed some minor issues in the multipart module (and was kindly given co-maintenance of it) and converted zope.publisher to use it. Although this took a while to sort out, it seems to have gone very well. A couple of other interesting late-arriving issues were around pickle. For most things we normally prefer safer formats such as JSON, but there are a few cases where we use pickle, particularly for our session databases. One of my colleagues pointed out that I needed to remember to tell pickle to stick to protocol 2, so that we d be able to switch back and forward between Python 2 and 3 for a while; quite right, and we later ran into a similar problem with marshal too. A more surprising problem was that datetime.datetime objects pickled on Python 2 require special care when unpickling on Python 3; rather than the approach that ended up being implemented and documented for Python 3.6, though, I preferred a custom unpickler, both so that things would work on Python 3.5 and so that I wouldn t have to risk affecting the decoding of other pickled strings in the session database. General lessons Writing this over a year after Python 2 s end-of-life date, and certainly nowhere near the leading edge of Python 3 porting work, it s perhaps more useful to look at this in terms of the lessons it has for other large technical debt projects. I mentioned in my previous article that I used the approach of an enormous and frequently-rebased git branch as a working area for the port, committing often and sometimes combining and extracting commits for review once they seemed to be ready. A port of this scale would have been entirely intractable without a tool of similar power to git rebase, so I m very glad that we finished migrating to git in 2019. I relied on this right up to the end of the port, and it also allowed for quick assessments of how much more there was to land. git worktree was also helpful, in that I could easily maintain working trees built for each of Python 2 and 3 for comparison. As is usual for most multi-developer projects, all changes to Launchpad need to go through code review, although we sometimes make exceptions for very simple and obvious changes that can be self-reviewed. Since I knew from the outset that this was going to generate a lot of changes for review, I therefore structured my work from the outset to try to make it as easy as possible for my colleagues to review it. This generally involved keeping most changes to a somewhat manageable size of 800 lines or less (although this wasn t always possible), and arranging commits mainly according to the kind of change they made rather than their location. For example, when I needed to fix issues with / in Python 3 being true division rather than floor division, I did so in one commit across the various places where it mattered and took care not to mix it with other unrelated changes. This is good practice for nearly any kind of development, but it was especially important here since it allowed reviewers to consider a clear explanation of what I was doing in the commit message and then skim-read the rest of it much more quickly. It was vital to keep the codebase in a working state at all times, and deploy to production reasonably often: this way if something went wrong the amount of code we had to debug to figure out what had happened was always tractable. (Although I can t seem to find it now to link to it, I saw an account a while back of a company that had taken a flag-day approach instead with a large codebase. It seemed to work for them, but I m certain we couldn t have made it work for Launchpad.) I can t speak too highly of Launchpad s test suite, much of which originated before my time. Without a great deal of extensive coverage of all sorts of interesting edge cases at both the unit and functional level, and a corresponding culture of maintaining that test suite well when making new changes, it would have been impossible to be anything like as confident of the port as we were. As part of the porting work, we split out a couple of substantial chunks of the Launchpad codebase that could easily be decoupled from the core: its Mailman integration and its code import worker. Both of these had substantial dependencies with complex requirements for porting to Python 3, and arranging to be able to do these separately on their own schedule was absolutely worth it. Like disentangling balls of wool, any opportunity you can take to make things less tightly-coupled is probably going to make it easier to disentangle the rest. (I can see a tractable way forward to porting the code import worker, so we may well get that done soon. Our Mailman integration will need to be rewritten, though, since it currently depends on the Python-2-only Mailman 2, and Mailman 3 has a different architecture.) Python lessons Our database layer was already in pretty good shape for a port, since at least the modern bits of its table modelling interface were already strict about using Unicode for text columns. If you have any kind of pervasive low-level framework like this, then making it be pedantic at you in advance of a Python 3 port will probably incur much less swearing in the long run, as you won t be trying to deal with quite so many bytes/text issues at the same time as everything else. Early in our port, we established a standard set of __future__ imports and started incrementally converting files over to them, mainly because we weren t yet sure what else to do and it seemed likely to be helpful. absolute_import was definitely reasonable (and not often a problem in our code), and print_function was annoying but necessary. In hindsight I m not sure about unicode_literals, though. For files that only deal with bytes and text it was reasonable enough, but as I mentioned above there were also a number of cases where we needed literals of the language s native str type, i.e. bytes in Python 2 and text in Python 3: this was particularly noticeable in WSGI contexts, but also cropped up in some other surprising places. We generally either omitted unicode_literals or used six.ensure_str in such cases, but it was definitely a bit awkward and maybe I should have listened more to people telling me it might be a bad idea. A lot of Launchpad s early tests used doctest, mainly in the style where you have text files that interleave narrative commentary with examples. The development team later reached consensus that this was best avoided in most cases, but by then there were far too many doctests to conveniently rewrite in some other form. Porting doctests to Python 3 is really annoying. You run into all the little changes in how objects are represented as text (particularly u'...' versus '...', but plenty of other cases as well); you have next to no tools to do anything useful like skipping individual bits of a doctest that don t apply; using __future__ imports requires the rather obscure approach of adding the relevant names to the doctest s globals in the relevant DocFileSuite or DocTestSuite; dealing with many exception tracebacks requires something like zope.testing.renormalizing; and whatever code refactoring tools you re using probably don t work properly. Basically, don t have done that. It did all turn out to be tractable for us in the end, and I managed to avoid using much in the way of fragile doctest extensions aside from the aforementioned zope.testing.renormalizing, but it was not an enjoyable experience. Regressions I know of nine regressions that reached Launchpad s production systems as a result of this porting work; of course there were various other regressions caught by CI or in manual testing. (Considering the size of this project, I count it as a resounding success that there were only nine production issues, and that for the most part we were able to fix them quickly.) Equality testing of removed database objects One of the things we had to do while porting to Python 3 was to implement the __eq__, __ne__, and __hash__ special methods for all our database objects. This was quite conceptually fiddly, because doing this requires knowing each object s primary key, and that may not yet be available if we ve created an object in Python but not yet flushed the actual INSERT statement to the database (most of our primary keys are auto-incrementing sequences). We thus had to take care to flush pending SQL statements in such cases in order to ensure that we know the primary keys. However, it s possible to have a problem at the other end of the object lifecycle: that is, a Python object might still be reachable in memory even though the underlying row has been DELETEd from the database. In most cases we don t keep removed objects around for obvious reasons, but it can happen in caching code, and buildd-manager crashed as a result (in fact while it was still running on Python 2). We had to take extra care to avoid this problem. Debian imports crashed on non-UTF-8 filenames Python 2 has some unfortunate behaviour around passing bytes or Unicode strings (depending on the platform) to shutil.rmtree, and the combination of some porting work and a particular source package in Debian that contained a non-UTF-8 file name caused us to run into this. The fix was to ensure that the argument passed to shutil.rmtree is a str regardless of Python version. We d actually run into something similar before: it s a subtle porting gotcha, since it s quite easy to end up passing Unicode strings to shutil.rmtree if you re in the process of porting your code to Python 3, and you might easily not notice if the file names in your tests are all encoded using UTF-8. lazr.restful ETags We eventually got far enough along that we could switch one of our four appserver machines (we have quite a number of other machines too, but the appservers handle web and API requests) to Python 3 and see what happened. By this point our extensive test suite had shaken out the vast majority of the things that could go wrong, but there was always going to be room for some interesting edge cases. One of the Ubuntu kernel team reported that they were seeing an increase in 412 Precondition Failed errors in some of their scripts that use our webservice API. These can happen when you re trying to modify an existing resource: the underlying protocol involves sending an If-Match header with the ETag that the client thinks the resource has, and if this doesn t match the ETag that the server calculates for the resource then the client has to refresh its copy of the resource and try again. We initially thought that this might be legitimate since it can happen in normal operation if you collide with another client making changes to the same resource, but it soon became clear that something stranger was going on: we were getting inconsistent ETags for the same object even when it was unchanged. Since we d recently switched a quarter of our appservers to Python 3, that was a natural suspect. Our lazr.restful package provides the framework for our webservice API, and roughly speaking it generates ETags by serializing objects into some kind of canonical form and hashing the result. Unfortunately the serialization was dependent on the Python version in a few ways, and in particular it serialized lists of strings such as lists of bug tags differently: Python 2 used [u'foo', u'bar', u'baz'] where Python 3 used ['foo', 'bar', 'baz']. In lazr.restful 1.0.3 we switched to using JSON for this, removing the Python version dependency and ensuring consistent behaviour between appservers. Memory leaks This problem took the longest to solve. We noticed fairly quickly from our graphs that the appserver machine we d switched to Python 3 had a serious memory leak. Our appservers had always been a bit leaky, but now it wasn t so much a small hole that we can bail occasionally as the boat is sinking rapidly : A serious memory leak (Yes, this got in the way of working out what was going on with ETags for a while.) I spent ages messing around with various attempts to fix this. Since only a quarter of our appservers were affected, and we could get by on 75% capacity for a while, it wasn t urgent but it was definitely annoying. After spending some quality time with objgraph, for some time I thought traceback reference cycles might be at fault, and I sent a number of fixes to various upstream projects for those (e.g. zope.pagetemplate). Those didn t help the leaks much though, and after a while it became clear to me that this couldn t be the sole problem: Python has a cyclic garbage collector that will eventually collect reference cycles as long as there are no strong references to any objects in them, although it might not happen very quickly. Something else must be going on. Debugging reference leaks in any non-trivial and long-running Python program is extremely arduous, especially with ORMs that naturally tend to end up with lots of cycles and caches. After a while I formed a hypothesis that zope.server might be keeping a strong reference to something, although I never managed to nail it down more firmly than that. This was an attractive theory as we were already in the process of migrating to Gunicorn for other reasons anyway, and Gunicorn also has a convenient max_requests setting that s good at mitigating memory leaks. Getting this all in place took some time, but once we did we found that everything was much more stable: A rather flat memory graph This isn t completely satisfying as we never quite got to the bottom of the leak itself, and it s entirely possible that we ve only papered over it using max_requests: I expect we ll gradually back off on how frequently we restart workers over time to try to track this down. However, pragmatically, it s no longer an operational concern. Mirror prober HTTPS proxy handling After we switched our script servers to Python 3, we had several reports of mirror probing failures. (Launchpad keeps lists of Ubuntu archive and image mirrors, and probes them every so often to check that they re reasonably complete and up to date.) This only affected HTTPS mirrors when probed via a proxy server, support for which is a relatively recent feature in Launchpad and involved some code that we never managed to unit-test properly: of course this is exactly the code that went wrong. Sadly I wasn t able to sort out that gap, but at least the fix was simple. Non-MIME-encoded email headers As I mentioned above, there were substantial changes in the email package between Python 2 and 3, and indeed between minor versions of Python 3. Our test coverage here is pretty good, but it s an area where it s very easy to have gaps. We noticed that a script that processes incoming email was crashing on messages with headers that were non-ASCII but not MIME-encoded (and indeed then crashing again when it tried to send a notification of the crash!). The only examples of these I looked at were spam, but we still didn t want to crash on them. The fix involved being somewhat more careful about both the handling of headers returned by Python s email parser and the building of outgoing email notifications. This seems to be working well so far, although I wouldn t be surprised to find the odd other incorrect detail in this sort of area. Failure to handle non-ISO-8859-1 URL-encoded form input Remember how I said that parsing HTTP form data was thorny? After we finished upgrading all our appservers to Python 3, people started reporting that they couldn t post Unicode comments to bugs, which turned out to be only if the attempt was made using JavaScript, and was because I hadn t quite managed to get URL-encoded form data working properly with zope.publisher and multipart. The current standard describes the URL-encoded format for form data as in many ways an aberrant monstrosity , so this was no great surprise. Part of the problem was some very strange choices in zope.publisher dating back to 2004 or earlier, which I attempted to clean up and simplify. The rest was that Python 2 s urlparse.parse_qs unconditionally decodes percent-encoded sequences as ISO-8859-1 if they re passed in as part of a Unicode string, so multipart needs to work around this on Python 2. I m still not completely confident that this is correct in all situations, but at least now that we re on Python 3 everywhere the matrix of cases we need to care about is smaller. Inconsistent marshalling of Loggerhead s disk cache We use Loggerhead for providing web browsing of Bazaar branches. When we upgraded one of its two servers to Python 3, we immediately noticed that the one still on Python 2 was failing to read back its revision information cache, which it stores in a database on disk. (We noticed this because it caused a deployment to fail: when we tried to roll out new code to the instance still on Python 2, Nagios checks had already caused an incompatible cache to be written for one branch from the Python 3 instance.) This turned out to be a similar problem to the pickle issue mentioned above, except this one was with marshal, which I didn t think to look for because it s a relatively obscure module mostly used for internal purposes by Python itself; I m not sure that Loggerhead should really be using it in the first place. The fix was relatively straightforward, complicated mainly by now needing to cope with throwing away unreadable cache data. Ironically, if we d just gone ahead and taken the nominally riskier path of upgrading both servers at the same time, we might never have had a problem here. Intermittent bzr failures Finally, after we upgraded one of our two Bazaar codehosting servers to Python 3, we had a report of intermittent bzr branch hangs. After some digging I found this in our logs:
Traceback (most recent call last):
  ...
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/conch/ssh/channel.py", line 136, in addWindowBytes
    self.startWriting()
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/lazr/sshserver/session.py", line 88, in startWriting
    resumeProducing()
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/internet/process.py", line 894, in resumeProducing
    for p in self.pipes.itervalues():
builtins.AttributeError: 'dict' object has no attribute 'itervalues'
I d seen this before in our git hosting service: it was a bug in Twisted s Python 3 port, fixed after 20.3.0 but unfortunately after the last release that supported Python 2, so we had to backport that patch. Using the same backport dealt with this. Onwards!

16 July 2021

Dirk Eddelbuettel: RcppArmadillo 0.10.6.0.0 on CRAN: A New Upstream

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 882 other packages on CRAN. This new release gets us Armadillo 10.6.0 which was released yesterday. We did the usual reverse dependency checks (which came out spotless and clean), and had also just done even fuller checks for Rcpp 1.0.7. Since the previous RcppArmadillo 0.10.5.0.0 release we made a few interim releases to the drat repo. In general, Conrad is a little more active than we want to be with (montly or less frequent) CRAN updates so keep and eye on the drat repo (or follow the GitHub repo) for a higher-frequence cadence. To use the drat repo, use install.packages("RcppArmadillo", repos="https://RcppCore.github.io/drat") or update.packages() with a similar repos argument. The full set of changes follows. We include the last interim release as well.

Changes in RcppArmadillo version 0.10.6.0.0 (2021-07-16)
  • Upgraded to Armadillo release 10.6.0 (Keep Calm)
    • expanded chol() to optionally use pivoted decomposition
    • expanded vector, matrix and cube constructors to allow element initialisation via fill::value(scalar), eg. mat X(4,5,fill::value(123))
    • faster loading of CSV files when using OpenMP
    • added csv_opts::semicolon option to allow saving/loading of CSV files with semicolon (;) instead of comma (,) as the separator

Changes in RcppArmadillo version 0.10.5.3.0 (2021-07-01)
  • Upgraded to Armadillo release 10.5.3 (Antipodean Fortress)
  • GitHub-only release
  • Extended test coverage with several new tests, added a coverage badge.

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

20 June 2021

Mike Gabriel: BBB Packaging for Debian, a short Heads-Up

Over the past days, I have received tons of positive feedback on my previous blog post about forming the Debian BBB Packaging Team [1]. Feedback arrived via mail, IRC, [matrix] and Mastodon. Awesome. Thanks for sharing your thoughts, folks... Therefore, here comes a short ... Heads-Up on the current Ongoings ... around packaging BigBlueButton for Debian: Credits light+love
Mike Gabriel

[1] https://sunweavers.net/blog/node/133
[2] https://bigbluebutton.org/event-page/
[3] https://docs.google.com/document/d/1kpYJxYFVuWhB84bB73kmAQoGIS59ari1_hn2...

11 June 2021

Mike Gabriel: New: The Debian BBB Packaging Team (and: Kurento Media Server goes Debian)

Today, Fre(i)e Software GmbH has been contracted for packaging Kurento Media Server for Debian. This packaging project will be funded by GUUG e.V. (the German Unix User Group e.V.). A big thanks to the people from GUUG e.V. for making this packaging project possible. About Kurento Media Server Kurento is an open source software project providing a platform suitable for creating modular applications with advanced real-time communication capabilities. For knowing more about Kurento, please visit the Kurento project website: https://www.kurento.org. Kurento is part of FIWARE. For further information on the relationship of FIWARE and Kurento check the Kurento FIWARE Catalog Entry. Kurento is also part of the NUBOMEDIA research initiative. Kurento Media Server is a WebRTC-compatible server that processes audio and video streams, doing composable pipeline-based processing of media. About BigBlueButton As some of you may know, Kurento Media Server is one of the core components of the BigBlueButton software, an ,,Open Source Virtual Classroom Software''. The context of the KMS funding is - after several other steps - getting the complete software component stack of BigBlueButton (aka BBB) into Debian some day, so that we can provide BBB as native Debian packages. On Debian. (Currently, one needs to use an always already a bit outdated version of Ubuntu). Due to this greater context, I just created the Debian BBB Packaging Team on salsa.debian.org. Outlook and Appreciation The current project (uploading Kurento Media Server to Debian) will very likely be extended to one year of package maintenance for all Kurento Media Server components in Debian. Extending this maintenance funding to a second year, has also been discussed, and seems a possible option. Probably most Debian Developer colleagues will agree with me when I say that Debian packaging is not a one-time shot until the first uploads of software packages have landed and settled. Debian package maintenance is a long term responsibility and requires long term commitment. I am very glad, that the people at GUUG e.V are on the same page with me (with us) regarding this. This is much and dearly appreciated. Thank you!!! What else? Well, we have also talked about another BigBlueButton component that is not yet in Debian: FreeSwitch. But more of that, when time has come. How to Join the Debian BBB Packaging Team? Please ping me via IRC (sunweaver on OFTC IRC) or [matrix] (@sunweaver:matrix.org). How to Support the Debian BBB Packaging Team? If you, your organization, your company, your municipality, your university, etc. feels like supporting the effort of packaging BigBlueButton for Debian, please get in touch with: mike.gabriel@freiesoftware.gmbh And yes, the company homepage is not online, yet, but it is in the makings... light+love
Mike (aka sunweaver)

24 May 2021

Antoine Beaupr : Leaving Freenode

The freenode IRC network has been hijacked. TL;DR: move to libera.chat or OFTC.net, as did countless free software projects including Gentoo, CentOS, KDE, Wikipedia, FOSDEM, and more. Debian and the Tor project were already on OFTC and are not affected by this.

What is freenode and why should I care? freenode is the largest remaining IRC network. Before this incident, it had close to 80,000 users, which is small in terms of modern internet history -- even small social networks are larger by multiple orders of magnitude -- but is large in IRC history. The IRC network is also extensively used by the free software community, being the default IRC network on many programs, and used by hundreds if not thousands of free software projects. I have been using freenode since at least 2006. This matters if you care about IRC, the internet, open protocols, decentralisation, and, to a certain extent, federation as well. It also touches on who has the right on network resources: the people who "own" it (through money) or the people who make it work (through their labor). I am biased towards open protocols, the internet, federation, and worker power, and this might taint this analysis.

What happened? It's a long story, but basically:
  1. back in 2017, the former head of staff sold the freenode.net domain (and its related company) to Andrew Lee, "American entrepreneur, software developer and writer", and, rather weirdly, supposedly "crown prince of Korea" although that part is kind of complex (see House of Yi, Yi Won, and Yi Seok). It should be noted the Korean Empire hasn't existed for over a century at this point (even though its flag, also weirdly, remains)
  2. back then, this was only known to the public as this strange PIA and freenode joining forces gimmick. it was suspicious at first, but since the network kept running, no one paid much attention to it. opers of the network were similarly reassured that Lee would have no say in the management of the network
  3. this all changed recently when Lee asserted ownership of the freenode.net domain and started meddling in the operations of the network, according to this summary. this part is disputed, but it is corroborated by almost a dozen former staff which collectively resigned from the network in protest, after legal threats, when it was obvious freenode was lost.
  4. the departing freenode staff founded a new network, irc.libera.chat, based on the new ircd they were working on with OFTC, solanum
  5. meanwhile, bot armies started attacking all IRC networks: both libera and freenode, but also OFTC and unrelated networks like a small one I help operate. those attacks have mostly stopped as of this writing (2021-05-24 17:30UTC)
  6. on freenode, however, things are going for the worse: Lee has been accused of taking over a channel, in a grotesque abuse of power; then changing freenode policy to not only justify the abuse, but also remove rules against hateful speech, effectively allowing nazis on the network (update: the change was reverted, but not by Lee)
Update: even though the policy change was reverted, the actual conversations allowed on freenode have already degenerated into toxic garbage. There are also massive channel takeovers (presumably over 700), mostly on channels that were redirecting to libera, but also channels that were still live. Channels that were taken over include #fosdem, #wikipedia, #haskell... Instead of working on the network, the new "so-called freenode" staff is spending effort writing bots and patches to basically automate taking over channels. I run an IRC network and this bot is obviously not standard "services" stuff... This is just grotesque. At this point I agree with this HN comment:
We should stop implicitly legitimizing Andrew Lee's power grab by referring to his dominion as "Freenode". Freenode is a quarter-century-old community that has changed its name to libera.chat; the thing being referred to here as "Freenode" is something else that has illegitimately acquired control of Freenode's old servers and user database, causing enormous inconvenience to the real Freenode.
I don't agree with the suggested name there, let's instead call it "so called freenode" as suggested later in the thread.

What now? I recommend people and organisations move away from freenode as soon as possible. This is a major change: documentation needs to be fixed, and the migration needs to be coordinated. But I do not believe we can trust the new freenode "owners" to operate the network reliably and in good faith. It's also important to use the current momentum to build a critical mass elsewhere so that people don't end up on freenode again by default and find an even more toxic community than your typical run-of-the-mill free software project (which is already not a high bar to meet). Update: people are moving to libera in droves. It's now reaching 18,000 users, which is bigger than OFTC and getting close to the largest, traditionnal, IRC networks (EFnet, Undernet, IRCnet are in the 10-20k users range). so-called freenode is still larger, currently clocking 68,000 users, but that's a huge drop from the previous count which was 78,000 before the exodus began. We're even starting to see the effects of the migration on netsplit.de. Update 2: the isfreenodedeadyet.com site is updated more frequently than netsplit and shows tons more information. It shows 25k online users for libera and 61k for so-called freenode (down from ~78k), and the trend doesn't seem to be stopping for so-called freenode. There's also a list of 400+ channels that have moved out. Keep in mind that such migrations take effect over long periods of time.

Where do I move to? The first thing you should do is to figure out which tool to use for interactive user support. There are multiple alternatives, of course -- this is the internet after all -- but here is a short list of suggestions, in preferred priority order:
  1. irc.libera.chat
  2. irc.OFTC.net
  3. Matrix.org, which bridges with OFTC and (hopefully soon) with libera as well, modern IRC alternative
  4. XMPP/Jabber also still exists, if you're into that kind of stuff, but I don't think the "chat room" story is great there, at least not as good as Matrix
Basically, the decision tree is this:
  • if you want to stay on IRC:
    • if you are already on many OFTC channels and few freenode channels: move to OFTC
    • if you are more inclined to support the previous freenode staff: move to libera
    • if you care about matrix users (in the short term): move to OFTC
  • if you are ready to leave IRC:
    • if you want the latest and greatest: move to Matrix
    • if you like XML and already use XMPP: move to XMPP
Frankly, at this point, everyone should seriously consider moving to Matrix. The user story is great, the web is a first class user, it supports E2EE (although XMPP as well), and has a lot of momentum behind it. It even bridges with IRC well (which is not the case for XMPP) so if you're worried about problems like this happening again. (Indeed, I wouldn't be surprised if similar drama happens on OFTC or libera in the future. The history of IRC is full of such epic controversies, takeovers, sabotage, attacks, technical flamewars, and other silly things. I am not sure, but I suspect a federated model like Matrix might be more resilient to conflicts like this one.) Changing protocols might mean losing a bunch of users however: not everyone is ready to move to Matrix, for example. Graybeards like me have been using irssi for years, if not decades, and would take quite a bit of convincing to move elsewhere. I have mostly kept my channels on IRC, and moved either to OFTC or libera. In retrospect, I think I might have moved everything to OFTC if I would have thought about it more, because almost all of my channels are there. But I kind of expect a lot of the freenode community to move to libera, so I am keeping a socket open there anyways.

How do I move? The first thing you should do is to update documentation, websites, and source code to stop pointing at freenode altogether. This is what I did for feed2exec, for example. You need to let people know in the current channel as well, and possibly shutdown the channel on freenode. Since my channels are either small or empty, I took the radical approach of:
  • redirecting the channel to ##unavailable which is historically the way we show channels have moved to another network
  • make the channel invite-only (which effectively enforces the redirection)
  • kicking everyone out of the channel
  • kickban people who rejoin
  • set the topic to announce the change
In IRC speak, the following commands should do all this:
/msg ChanServ set #anarcat mlock +if ##unavailable
/msg ChanServ clear #anarcat users moving to irc.libera.chat
/msg ChanServ set #anarcat restricted on
/topic #anarcat this channel has moved to irc.libera.chat
If the channel is not registered, the following might work
/mode #anarcat +if ##unavailable
Then you can leave freenode altogether:
/disconnect Freenode unacceptable hijack, policy changes and takeovers. so long and thanks for all the fish.
Keep in mind that some people have been unable to setup such redirections, because the new freenode staff have taken over their channel, in which case you're out of luck... Some people have expressed concern about their private data hosted at freenode as well. If you care about this, you can always talk to NickServ and DROP your nick. Be warned, however, that this assumes good faith of the network operators, which, at this point, is kind of futile. I would assume any data you have registered on there (typically: your NickServ password and email address) to be compromised and leaked. If your password is used elsewhere (tsk, tsk), change it everywhere. Update: there's also another procedure, similar to the above, but with a different approach. Keep in mind that so-called freenode staff are actively hijacking channels for the mere act of mentioning libera in the channel topic, so thread carefully there.

Last words This is a sad time for IRC in general, and freenode in particular. It's a real shame that the previous freenode staff have been kicked out, and it's especially horrible that if the new policies of the network are basically making the network open to nazis. I wish things would have gone out differently: now we have yet another fork in the IRC history. While it's not the first time freenode changes name (it was called OPN before), now the old freenode is still around and this will bring much confusion to the world, especially since the new freenode staff is still claiming to support FOSS. I understand there are many sides to this story, and some people were deeply hurt by all this. But for me, it's completely unacceptable to keep pushing your staff so hard that they basically all (except one?) resign in protest. For me, that's leadership failure at the utmost, and a complete disgrace. And of course, I can't in good conscience support or join a network that allows hate speech. Regardless of the fate of whatever we'll call what's left of freenode, maybe it's time for this old IRC thing to die already. It's still a sad day in internet history, but then again, maybe IRC will never die...

Dirk Eddelbuettel: RcppArmadillo 0.10.5.0.0 on CRAN: New Upstream

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 865 other packages on CRAN. This new release brings Armadillo 10.5.0 which was released early on Friday. We had done one full test in the 10.5 rc1 prerelease one week earlier, and did another test on 10.5.0 and this 0.10.5.0.0 RcppArmadillo release just for added rigour. The package was then uploaded to CRAN late Friday (my timezone). The automated process flagged one NOTE as a false positive (yet another instance of the well-known (yet dreaded) issue of Suggests != Depends by one these 865 packages). This lead to a need of an inspection by one of the CRAN maintainers, and the weekend being the weekend it was only processed just now. Upstream moves at a speed that is a little faster than the cadence CRAN likes. As we had released RcppArmadillo 0.10.4.0.0 on April 13 we did not want to follow-up too soon thereafter with 0.10.4.1.0 which was thusly only a GitHub and drat release (which can always be had easily too via install.packages("RcppArmadillo", repos="https://RcppCore.github.io/drat").) The full set of changes follows. We include the aforementioned interim release as well.

Changes in RcppArmadillo version 0.10.5.0 (2021-05-21)
  • Upgraded to Armadillo release 10.5 (Antipodean Fortress)
    • added .clamp() member function
    • expanded the standalone clamp() function to handle complex values
    • more efficient use of OpenMP
    • vector, matrix and cube constructors now initialise elements to zero by default; use the fill::none specifier, eg. mat X(4,5,fill::none), to disable element initialisation
  • Added codecov.yml to exclude Armadillo from coverage analysis

Changes in RcppArmadillo version 0.10.4.1.0 (2021-04-23)
  • Upgraded to Armadillo release 10.4.1 (Pressure Cooker)
  • GitHub-only release

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

4 May 2021

Erich Schubert: Machine Learning Lecture Recordings

I have uploaded most of my Machine Learning lecture to YouTube. The slides are in English, but the audio is in German. Some very basic contents (e.g., a demo of standard k-means clustering) were left out from this advanced class, and instead only a link to recordings from an earlier class were given. In this class, I wanted to focus on the improved (accelerated) algorithms instead. These are not included here (yet). I believe there are some contents covered in this class you will find nowhere else (yet). The first unit is pretty long (I did not split it further yet). The later units are shorter recordings. ML F1: Principles in Machine Learning ML F2/F3: Correlation does not Imply Causation & Multiple Testing Problem ML F4: Overfitting beranpassung ML F5: Fluch der Dimensionalit t Curse of Dimensionality ML F6: Intrinsische Dimensionalit t Intrinsic Dimensionality ML F7: Distanzfunktionen und hnlichkeitsfunktionen ML L1: Einf hrung in die Klassifikation ML L2: Evaluation und Wahl von Klassifikatoren ML L3: Bayes-Klassifikatoren ML L4: N chste-Nachbarn Klassifikation ML L5: N chste Nachbarn und Kerndichtesch tzung ML L6: Lernen von Entscheidungsb umen ML L7: Splitkriterien bei Entscheidungsb umen ML L8: Ensembles und Meta-Learning: Random Forests und Gradient Boosting ML L9: Support Vector Machinen - Motivation ML L10: Affine Hyperebenen und Skalarprodukte Geometrie f r SVMs ML L11: Maximum Margin Hyperplane die breitest m gliche Stra e ML L12: Training Support Vector Machines ML L13: Non-linear SVM and the Kernel Trick ML L14: SVM Extensions and Conclusions ML L15: Motivation of Neural Networks ML L16: Threshold Logic Units ML L17: General Artificial Neural Networks ML L18: Learning Neural Networks with Backpropagation ML L19: Deep Neural Networks ML L20: Convolutional Neural Networks ML L21: Recurrent Neural Networks and LSTM ML L22: Conclusion Classification ML U1: Einleitung Clusteranalyse ML U2: Hierarchisches Clustering ML U3: Accelerating HAC mit Anderberg s Algorithmus ML U4: k-Means Clustering ML U5: Accelerating k-Means Clustering ML U6: Limitations of k-Means Clustering ML U7: Extensions of k-Means Clustering ML U8: Partitioning Around Medoids (k-Medoids) ML U9: Gaussian Mixture Modeling (EM Clustering) ML U10: Gaussian Mixture Modeling Demo ML U11: BIRCH and BETULA Clustering ML U12: Motivation Density-Based Clustering (DBSCAN) ML U13: Density-reachable and density-connected (DBSCAN Clustering) ML U14: DBSCAN Clustering ML U15: Parameterization of DBSCAN ML U16: Extensions and Variations of DBSCAN Clustering ML U17: OPTICS Clustering ML U18: Cluster Extraction from OPTICS Plots ML U19: Understanding the OPTICS Cluster Order ML U20: Spectral Clustering ML U21: Biclustering and Subspace Clustering ML U22: Further Clustering Approaches

1 May 2021

Ingo Juergensmann: The Fediverse What About Resources?

Today ist May, 1st. In about two weeks on May, 15th WhatsApp will put their changed Terms of Service into action and when you don t accept their rules you won t be able to use WhatsApp any longer. Early this year there was already a strong movement away from WhatsApp towards other solutions. Mainly to Signal, but also some other services like the Fediverse gained some new users. And also XMPP got their fair share of new users. So, what to do about the WhatsApp ToS change then? Shall we go all to Signal? Surely not. Signal is another vendor lock-in silo. It s centralistic and recent development plans want to implement some crypto payment system. Even Bruce Schneier thinks that this is a bad idea. Other alternatives often named include Matrix/Element or XMPP. Today, Don di Dislessia in the (german) Fediverse asked about power consumption of the Fediverse incl. Matrix and XMPP and how much renewable energy is being used. Of course this is no easy answer to this question, but I tried my best at least for my own server. Here are my findings and conclusions Power
screenshot showing power consumption of serverscreenshot showing power consumption of server
Currently my server in the colocation is using about 93W in average with 6c Xeon E5-2630L, 128 GB RAM, 4x 2 TB WD Red + 1 Samsung 960pro NVMe. The server is 7 years old. When I started with that server the power consumption was about 75W, but back then there were far less users on the server. So, 20W more over the past year Users I m running my Friendica node on Nerdica.net since 2013. Over the years it became one of the largest Friendica servers in the Fediverse, for some time it was the largest one. It has currently like 700 total users and 180 monthly active users. My Mastodon instance on Nerdculture.de has about 1000 total users and about 300 monthly active users. Since last year I also run a Matrix-Synapse server. Although I invited my family I m in fact the only active user on that server and have joined some channels. My XMPP server is even older than my Friendica node. For long time I had like maybe 20 users. Now I setup a new website and added some domains like hookipa.net and xmpp.social the user count increased and currently I have like 130 users on those two domains and maybe like 50 monthly active users. Also note that all my Friendica and Mastodon users can use XMPP with their accounts, but won t be counted the same way as on native users on ejabberd, because the auth backend is different. So, let s assume I do have like 2000 total users and 500 monthly active users. CPU, Database Sizes and Disk I/O Let s have a look about how many resources are being used by those users. Database Sizes: CPU times according to xentop: Friendica does use the largest database and causes most disk I/O on NVMe, but it s difficult to differentiate between the load between the web apps on the webserver. So, let s have a quick look on an simple metric: Number of lines in webserver logfile: These metrics correlate to some degree with the database I/O load, at least for Friendica. If you take into account the number of users, things look quite different. Conclusion Overall, and my personal impression, is that Matrix is really bad in regards of resource usage. Given that I m the only active user it uses exceptionally many resources. When you also consider that Matrix is using a distributed database for its chat rooms, you can assume that the resource usage is multiplied across the network, making things even worse. Friendica is using a large database and many disk accesses, but has a fairly large user base, so it seems ok, but of course should be improved. Mastodon seems to be quite good, considering the database size, the number of log lines and the user count. XMPP turns out to be the most efficient contestant in this comparison: it uses much less CPU cycles and database disk I/O. Of course, Mastdon/Friendica are different services than XMPP or Matrix. So, coming back to the initial question about alternatives to WhatsApp, the answer for me is: you should prefer XMPP over Matrix alone for reasons of saving resources and thus reducing power consumption. Less power consumption also means a smaller ecological footprint and fewer CO2 emissions for your communication with your family and friends. XMPP is surely not the perfect replacement for WhatsApp, but I think it is the best thing to recommend. As said above, I don t think that Signal is an viable option. It s just another proprierary silo with all the problems that come with it. Matrix is a resource hog and not a messenger but a MS Teams replacement. Element as the main Matrix client is laggy and not multi-account/multi-server capable. Other Matrix clients do support multiple accounts but are not as feature-complete as Element. In the end the Matrix ecosystem will suffer from the same issues as XMPP did already a decade ago. But XMPP has learned to deal with it. Also XMPP is proceeding fast in the last years and it has solved many problems many people are still complaining about. Sure, there still some open issues. The situation on IOS is still not as good as on Android with Conversations, but it is fairly close to it. There are many efforts to improve XMPP. There is Quicksy IM, which is a service that will use your phone number as Jabber ID/JID and is thus comparable to Signal which uses phone numbers as well as unique identifier. But Quicksy is compatible with XMPP standards. Snikket is another new XMPP ecosystem aiming at smaller groups hosting their own server by simply installing a Docker container and setup some basic SRV records in the DNS. Or there is Mailcow, a Docker based mailserver setup that added XMPP server in their setup as well, so you can have the same mail and XMPP address. Snikket even got EU based funding for implementing XMPP Account Portability which also will improve the decentralization even further. Additionally XMPP helps vaccination in Canada and USA with vaxbot by Monal. Be smart and use ecofriendly infrastructure.

29 March 2021

Anton Gladky: 2021/03, FLOSS activity

LTS This is my first (beside test time last year) official month of working for LTS. I was assigned 12 hrs and worked all of them. I could relatively easy set up the development environment for Debian Stretch and managed to release several DLAs.

Released DLAs
  1. DLA-2588-1 zeromq3_4.2.1-4+deb9u4
    • CVE-2021-20234
    • CVE-2021-20235
  2. DLA-2594-1 tomcat8_8.5.54-0+deb9u6
    • CVE-2021-24122
    • CVE-2021-25122
    • CVE-2021-25329.
  3. DLA-2605-1 mariadb-10.1_10.1.48-0+deb9u2
    • CVE-2021-27928

CVE-2020-119977 I investigated CVE-2020-119977, which was marked as guacamole-server issue. There were not so much information about this CVE. I was trying to analyze git log and git diff between affected and fixed versions without any visible success. After that I contacted upstream and they were very responsive! This CVE affects guacamole-client only and the ancient versions in the archive is very difficult to fix. So I decided to mark this CVE as NOT-FOR-US.

Repositories with pipelines For most of packages, which I touched due to LTS work the new repositories were created in LTS packages group on salsa.d.o with enabled CI-pipelines. It really helps to test updates though some tests needs to be disabled for passing pipelines.

LTS-Meeting I attended the Debian LTS team IRC-meeting.

Debian Science Team I have prepared and uploaded following packages, which are maintained under the umbrella of Debian Science Team:
  • gmsh_4.7.1+ds1-5
  • vtk7_7.1.1+dfsg2-10
  • gl2ps_1.4.2+dfsg1-1~bpo10+1
  • vtk9_9.0.1+dfsg1-8~bpo10+2
  • sundials_5.7.0+dfsg-1~exp1

22 February 2021

John Goerzen: Recovering Our Lost Free Will Online: Tools and Techniques That Are Available Now

As I ve been thinking and writing about privacy and decentralization lately, I had a conversation with a colleague this week, and he commented about how loss of privacy is related to loss of agency: that is, loss of our ability to make our own choices, pursue our own interests, and be master of our own attention. In terms of telecommunications, we have never really been free, though in terms of Internet and its predecessors, there have been times where we had a lot more choice. Many are too young to remember this, and for others, that era is a distant memory. The irony is that our present moment is one of enormous consolidation of power, and yet also one of a proliferation of technologies that let us wrest back some of that power. In this post, I hope to enlighten or remind us of some of the choices we have lost and also talk about the ways in which we can choose to regain them, already, right now. I will talk about the possibilities, the big dreams that are possible now, and then go into more detail about the solutions. The Problems & Possibilities The limitations of online We make the assumption that we must be online to exchange data. This is reinforced by many modern protocols; Twitter clients, for instance, don t tend to let you make posts by relaying them through disconnected devices. What would it be like if you could fully participate in global communities without a constant Internet connection? If you could share photos with your friends, read the news, read your email, etc. even if you don t have a connection at present? Even if the device you use to do that never has a connection, but can route messages via other devices that do? Would it surprise you to learn that this was once the case? Back in the days of UUCP, much email and Usenet news a global discussion forum that didn t require an Internet connection was relayed via occasional calls over phone lines. This technology remains with us, and has even improved. Sadly, many modern protocols make no effort in this regard. Some email clients will let you compose messages offline to send when you get online later, but the assumption always is that you will be connected to an IP network again soon. NNCP, on the other hand, lets you relay messages over TCP, a radio, a satellite, or a USB stick. Email and Usenet, since they were designed in an era where store-and-forward was valued, can actually still be used in an entirely offline fashion (without ever touching an IP-based network). All it takes is for someone to care to make it happen. You can even still do it over UUCP if you like. The physical and data link layers Many of us just accept that we communicate in a few ways: Wifi for short distances, and then cable modems or DSL for our local Internet connection, and then many people are fuzzy about what happens after that. Or, alternatively, we have 4G phones that are the local Internet connection, and the same fuzzy things happen after. Think about this for a moment. Which of these do you control in any way? Sometimes just wifi, sometimes maybe you have choices of local Internet providers. After that, your traffic is handled by enormous infrastructure companies. There is choice here. People in ham radio have been communicating digitally over long distances without the support of the traditional Internet for decades, but the technology to do this is now more accessible to anyone. Long-distance radio has had tremendous innovation in the last decade; cheap radios can now communicate over several miles/km without any other infrastructure at all. We all carry around radios (Wifi and Bluetooth) in our pockets that don t have to be used as mere access points to the Internet or as drivers of headphones, but can also form their own networks directly (Briar). Meshtastic is an example; it s an instant messenger that can form a mesh over many miles/km and requires no IP infrastructure at all. Briar is similar. XBee radios form a mesh in hardware, allowing peers to reach each other (also over many miles/km) with a serial or framed protocol. Loss of peer-to-peer Back in the late 90s, I worked at a university. I had a 386 on my desk for a workstation not a powerful computer even then. But I put the boa webserver on it and could just serve pages on the Internet. I didn t have to get permission. Didn t have to pay a hosting provider. I could just DO it. And of course that is because the university had no firewall and no NAT. Every PC at the university was a full participant on the Internet as much as the servers at Microsoft or DEC. All I needed was a DNS entry. I could run my own SMTP server if I wanted, run a web or Gopher server, and that was that. There are many reasons why this changed. Nowadays most residential ISPs will block SMTP for their customers, and if they didn t, others would; large email providers have decided not to federate with IPs in residential address spaces. Most people have difficulty even getting a static IP address in the first place. Many are behind firewalls, NATs, or both, meaning that incoming connections of any kind are problematic. Do you see what that means? It has weakened the whole point of the Internet being a network of peers. While IP still acts that way, as a practical matter, there are clients that are prevented from being servers by administrative policy they have no control over. Imagine if you, a person with an Internet connection to your laptop or phone, could just decide to host a website, or a forum on it. For moderate levels of load, they are certainly capable of this. The only thing in the way is the network management policies you can t control. Elaborate technologies exist to try to bridge this divide, and some, like Tor or cjdns, can work quite well. More on this below. Expense of running something popular Related to the loss of peer-to-peer infrastructure is the very high cost of hosting something popular. Do you want to share videos with lots of people? That almost certainly is going to require expensive equipment and bandwidth. There is a reason that there are only a small handful of popular video streaming sites online. It requires a ton of money to host videos at scale. What if it didn t? What if you could achieve economies of scale so much that you, an individual, could compete with the likes of YouTube? You wouldn t necessarily have to run ads to support the service. You wouldn t have to have billions of dollars or billions of viewers just to make it work. This technology exists right now. Of course many of you are aware of how Bittorrent leverages the swarm for files. But projects like IPFS, Dat, and Peertube have taken this many steps further to integrate it into a global ecosystem. And, at least in the case of Peertube, this is a thing that works right now in any browser already! Application-level walled gardens I was recently startled at how much excitement there was when Github introduced dark mode . Yes, Github now offers two colors on its interface. Already back in the 80s and 90s, many DOS programs had more options than that. Git is a decentralized protocol, but Github has managed to make it centralized. Email is a decentralized protocol pick your own provider, and they all communicate but Facebook and Twitter aren t. You can t just pick your provider for Facebook. It s Facebook or nothing. There is a profit motive in locking others out; these networks want to keep you using their platforms because their real customers are advertisers, and they want to keep showing you ads. Is it possible to have a world where you get to pick your own app for sharing photos, and it works even if your parents use a different one? Yes, yes it is. Mastodon and the Fediverse are fantastic examples for social media. Pixelfed is specifically designed for photos, Mastodon for short-form communication, there s Pleroma for more long-form communication, and they all work together. You can use Mastodon to read Pleroma content or look at Pixelfed photos, and there are many (free) providers of each. Freedom from manipulation I recently wrote about the dangers of the attention economy, so I won t go into a lot of detail here. Fundamentally, you are not the customer of Facebook or Google; advertisers are. They optimize their site to keep you on it as much as possible so that they can show you as many ads as possible which makes them as much money as possible. Ads, of course, are fundamentally seeking to manipulate your behavior ( buy this product ). By lowering the cost of running services, we can give a huge boost to hobbyists and nonprofits that want to do so without an ultimate profit motive. For-profit companies benefit also, with a dramatically reduced cost structure that frees them to pursue their mission instead of so many ads. Freedom from snooping (privacy and anonymity) These days, it s not just government snooping that people think about. It s data stolen by malware, spies at corporations (whether human or algorithmic), and even things like basic privacy of one s own security footage. Here the picture is improving; encryption in transit, at least at a basic level, has become much more common with TLS being a standard these days. Sadly, end-to-end encryption (E2EE) is not nearly as much, perhaps because corporations have a profit motive to have access to your plaintext and metadata. Closely related to privacy is anonymity: that is, being able to do things in an anonymous fashion. The two are not necessarily equal: you could send an encrypted message but reveal who the correspondents are, as with email; or, you could send a plaintext message over a Tor exit node that hides who the correspondents are. It is sometimes difficult to achieve both. Nevertheless, numerous answers exist here that tackle one or both problems, from the Signal messenger to Tor. Solutions That Exist Today Let s dive in to some of the things that exist today. One concept you ll see in many of these is integrated encryption with public keys used for addressing. In other words, your public key is akin to an IP address (and in some cases, is literally your IP address.) Data link and networking technologies (some including P2P) P2P Infrastructure While some of the technologies above, such as cjdns, explicitly facitilitate peer-to-peer communication, there are some other application-level technologies to look at. Instant Messengers and Chat I won t go into a lot of detail here since I recently wrote a roundup of secure mesh messengers and also a followup article about Signal and some hidden drawbacks of P2P. Please refer to those articles for some interesting things that are happening in this space. Matrix is a distributed IM platform similar in concept to Slack or IRC, but globally distributed in a mesh. It supports optional E2EE. Social Media I wrote recently about how to join the Fediverse, which covered joining Mastodon, a federeated, decentralized social network. Mastodon is the largest of these, with several million users, and is something of a much nicer version of Twitter. Mastodon is also part of what is known as the Fediverse , which are applications that are loosely joined together by their support of the ActivityPub protocol. Other popular Fediverse applications include Pixelfed (similar to Instagram) and Peertube for sharing video. Peertube is particularly interesting in that it supports Webtorrent for efficiently distributing popular videos. Webtorrent is akin to Bittorrent running efficiently inside your browser. Concluding Remarks Part of my goal with this is encouraging people to dream big, to ask questions like: What could you do if offline were easy? What is possible if you have freedom in the physical and data link layers? Dream big. We re so used to thinking that it s quite difficult for two devices on the Internet to talk to each other. What would be possible if this were actually quite easy? The assumption that costs rise dramatically as popularity increases is also baked into our thought processes. What if that weren t the case could you take on Youtube from your garage? Would lowering barriers to entry lower the ad economy and let nonprofits have more equal footing with large corporations? We have so many walled gardens, from Github to Facebook, that we almost forget it doesn t have to be that way. So having asked these questions, my secondary point is to suggest that these aren t pie-in-the-sky notions. These possibilites are with us right now. You ll notice from this list that virtually every one of these technologies is ad-free at its heart (though some would be capable of serving ads). They give you back your attention. Many preserve privacy, anonymity, or both. Many dramatically improve your freedom of association and communication. Technologies like IPFS and Bittorrent ease the burden of running something popular. Some are quite easy to use (Mastodon or Peertube) while others are much more complex (libp2p or the lower-level mesh network systems). Clearly there is still room for improvement in many areas. But my fundamental point is this: good technology is here, right now. Technical people can vote with their feet and wallets and start using it. Early adopters will help guide the way for the next set of improvements. Join us!

21 February 2021

Erich Schubert: My first Rust crate: faster kmedoids clustering

I have written my first Rust crate: kmedoids. Python users can use the wrapper package kmedoids. It implements k-medoids clustering, and includes our new FasterPAM algorithm that drastically reduces the computational overhead. As long as you can afford to compute the distance matrix of your data set, clustering it with k-medoids is now feasible even for large k. (If your data is continuous and you are interested in minimizing squared errors, k-means surely remains the better choice!) My take on Rust so far: Will I use it more? I don t know. Probably if I need extreme performance, but I likely would not want to do everything my self in a pedantic language. So community is key, and I do not see Rust shine there.

31 January 2021

John Goerzen: The Hidden Drawbacks of P2P (And a Defense of Signal)

Not long ago, I posted a roundup of secure messengers with off-the-grid capabilities. Some conversation followed, which led me to consider some of the problems with P2P protocols. P2P and Privacy Brave adopting IPFS has driven a lot of buzz lately. IPFS is essentially a decentralized, distributed web. This concept has a lot of promise. But take a look at the IPFS privacy document. Some things to highlight: So in this case, you have traded giving information about what you request to specific sites to giving it to potentially hundreds of untrusted peers, some of which may be logging this for nefarious purposes. Worse, you have a durable PeerID that can be used for tracking and tied to your IP address a data collector s dream. This PeerID, combined with DHT requests and the CIDs (Content ID) of the things you host (implying you viewed them in the past), can be used to establish a picture of what you are requesting now and requested recently. Similar can be said from everything like Scuttlebutt to GNU Jami; any service that operates on a P2P basis will likely reveal your IP, and tie your identity to it (and your IP address history). In some cases, as with Jami, this would be limited to friends you add; in others, as with Scuttlebutt and IPFS, it could be revealed to anyone. The advantages of P2P are undeniable and profound, but few are effectively addressing the privacy implications. The one I know of that is, Briar, routes all traffic over Tor; every node is reached by a Tor onion service. Federation: somewhat better In a federated model, every client connects to a server, and there are many servers participating in a federation with each other. Matrix and Mastodon are examples of a federated model. In this scenario, only one server your own homeserver can track you by IP. End-to-end encryption is certainly possible in a federated model, and Matrix supports it. This does give a third party (the specific server you use) knowledge of your IP, but that knowledge can be significantly limited. A downside of this approach is that if your particular homeserver is down, you are unable to communicate. Truly decentralized P2P solutions don t have that problem thought they do have a related one, which is that clients communicating with each other must both be online simultaneously in order for messages to be transmitted, and this can be a real challenge for mobile devices. Centralization and Signal Signal is centralized; it has one central server farm, and if it is down, you can t communicate or choose any other server, either. We saw it go down recently after Elon Musk mentioned it. Still, I recommend Signal for the general public. Here s why. Signal brings encryption and privacy to meet people where they re at, not the other way around. People don t have to choose a server, it can automatically recognize contacts that use Signal, it has emojis, attachments, secure voice and video calling, and (aside from the Musk incident), it all just works. It feels like, and is, a polished, modern experience with the bells and whistles people are used to. I m a huge fan of Matrix (aka Element) and even run my own instance. It has huge promise. But it is Not. There. Yet. Why do I saw this about Matrix? Again, I love MAtrix. I use it every day to interact with Matrix, IRC, Slack, and Discord channels. It has a ton of promise. But would I count on it to carry a my car s broken down and I m stranded message? No. How about some of the other options out there? I mentioned Briar above. It s fantastic and its offline options are novel and promising. But in common usage, it can t deliver a message unless both devices are online simultaneously, and doesn t run on iOS (though both are being worked on). It also can t send photos or do voice or video calling. Some of these same limitations apply to most of the other Signal alternatives also. either that, or they are encryption-optional, or terribly hard to set up and use. I recently mentioned Status, which shows a ton of promise, but has no voice or video calling capabilities. Scuttlebutt is a fantastic protocol with extremely difficult onboarding (lengthy process, error-prone finding a pub, multi-GB initial download, etc.) And many of these leak IP addresses as discussed above. So Signal gives people: If you are going to tell someone, it s so EASY to get your texts away from Facebook and AT&T , then Signal is the thing you ve got to point them to. It may not be in two years, but for now, it is. Do not let the perfect be the enemy of the good. It advances the status quo without harming usability, which nothing else does yet. I am aware of all of the very legitimate criticisms of Signal. They are real and they are why I am excited that there are so many alternatives with promise, some of which I use actively. Let us technical people use, debug, contribute to, and evangelize the alternatives. And while we re doing that, tell Grandma to contact us on Signal.

22 January 2021

Robert McQueen: Launching Endless OS Foundation

Passion Led Us Here

How our for-profit company became a nonprofit, to better tackle the digital divide. Originally posted on the Endless OS Foundation blog.

An 8-year journey to a nonprofit On the 1st of April 2020, our for-profit Endless Mobile officially became a nonprofit as the Endless OS Foundation. Our launch as a nonprofit just as the global pandemic took hold was, predictably, hardly noticed, but for us the timing was incredible: as the world collectively asked What can we do to help others in need? , we framed our mission statement and launched our .org with the same very important question in mind. Endless always had a social impact mission at its heart, and the challenges related to students, families, and communities falling further into the digital divide during COVID-19 brought new urgency and purpose to our team s decision to officially step in the social welfare space.
On April 1st 2020, our for-profit Endless Mobile officially became a nonprofit as the Endless OS Foundation, focused on the #DigitalDivide.
Our updated status was a long time coming: we began our transformation to a nonprofit organization in late 2019 with the realization that the true charter and passions of our team would be greatly accelerated without the constraints of for-profit goals, investors and sales strategies standing in the way of our mission of digital access and equity for all. But for 8 years we made a go of it commercially, headquartered in Silicon Valley and framing ourselves as a tech startup with access to the venture capital and partnerships on our doorstep. We believed that a successful commercial channel would be the most efficient way to scale the impact of bringing computer devices and access to communities in need. We still believe this we ve just learned through our experience that we don t have the funding to enter the computer and OS marketplace head-on. With the social impact goal first, and the hope of any revenue a secondary goal, we have had many successes in those 8 years bridging the digital divide throughout the world, from Brazil, to Kenya, and the USA. We ve learned a huge amount which will go on to inform our strategy as a nonprofit.
Endless always had a social impact mission at its heart. COVID-19 brought new urgency and purpose to our team s decision to officially step in the social welfare space.
Our unique perspective One thing we learned as a for-profit is that the OS and technology we ve built has some unique properties which are hugely impactful as a working solution to digital equity barriers. And our experience deploying in the field around the world for 8 years has left us uniquely informed via many iterations and incremental improvements.
Endless OS designer in discussion with prospective user
With this knowledge in-hand, we ve been refining our strategy throughout 2020 and now starting to focus on what it really means to become an effective nonprofit and make that impact. In many ways it is liberating to abandon the goals and constraints of being a for-profit entity, and in other ways it s been a challenging journey for me and the team to adjust our way of thinking and let these for-profit notions and models go. Previously we exclusively built and sold a product that defined our success; and any impact we achieved was a secondary consequence of that success and seen through that lens. Now our success is defined purely in terms of social impact, and through our actions, those positive impacts can be made with or without our product . That means that we may develop and introduce technology to solve a problem, but it is equally as valid to find another organization s existing offering and design a way to increase that positive impact and scale.
We develop technology to solve access equity issues, but it s equally as valid to find another organization s offering and partner in a way that increases their positive impact.
The analogy to Free and Open Source Software is very strong while Endless has always used and contributed to a wide variety of FOSS projects, we ve also had a tension where we ve been trying to hold some pieces back and capture value such as our own application or content ecosystem, our own hardware platform necessarily making us competitors to other organisations even though they were hoping to achieve the same things as us. As a nonprofit we can let these ideas go and just pick the best partners and technologies to help the people we re trying to reach.
School kids writing on paper
Digital equity 4 barriers we need to overcome In future, our decisions around which projects to build or engage with will revolve around 4 barriers to digital equity, and how our Endless OS, Endless projects, or our partners offerings can help to solve them. We define these 4 equity barriers as: barriers to devices, barriers to connectivity, barriers to literacy in terms of your ability to use the technology, and barriers to engagement in terms of whether using the system is rewarding and worthwhile.
We define the 4 digital equity barriers we exist to impact as:
1. barriers to devices
2. barriers to connectivity
3. barriers to literacy
4. barriers to engagement
It doesn t matter who makes the solutions that break these barriers; what matters is how we assist in enabling people to use technology to gain access to the education and opportunities these barriers block. Our goal therefore is to simply ensure that solutions exist building them ourselves and with partners such as the FOSS community and other nonprofits proving them with real-world deployments, and sharing our results as widely as possible to allow for better adoption globally. If we define our goal purely in terms of whether people are using Endless OS, we are effectively restricting the reach and scale of our solutions to the audience we can reach directly with Endless OS downloads, installs and propagation. Conversely, partnerships that scale impact are a win-win-win for us, our partners, and the communities we all serve. Engineering impact Our Endless engineering roots and capabilities feed our unique ability to build and deploy all of our solutions, and the practical experience of deploying them gives us evidence and credibility as we advocate for their use. Either activity would be weaker without the other.
Our engineering roots and capabilities feed our unique ability to build and deploy digital divide solutions.
Our partners in various engineering communities will have already seen our change in approach. Particularly, with GNOME we are working hard to invest in upstream and reconcile the long-standing differences between our experience and GNOME. If successful, many more people can benefit from our work than just users of Endless OS. We re working with Learning Equality on Kolibri to build a better app experience for Linux desktop users and bring content publishers into our ecosystem for the first time, and we ve also taken our very own Hack, the immersive and fun destination for kids learning to code, released it for non-Endless systems on Flathub, and made it fully open-source.
Planning tasks with sticky notes on a whiteboard
What s next for our OS? What then is in store for the future of Endless OS, the place where we have invested so much time and planning through years of iterations? For the immediate future, we need the capacity to deploy everything we ve built all at once, to our partners. We built an OS that we feel is very unique and valuable, containing a number of world-firsts: first production OS shipped with OSTree, first Flatpak-only desktop, built-in support for updating OS and apps from USBs, while still providing a great deal of reliability and convenience for deployments in offline and educational-safe environments with great apps and content loaded on every system. However, we need to find a way to deliver this Linux-based experience in a more efficient way, and we d love to talk if you have ideas about how we can do this, perhaps as partners. Can the idea of Endless OS evolve to become a spec that is provided by different platforms in the future, maybe remixes of Debian, Fedora, openSUSE or Ubuntu? Build, Validate, Advocate Beyond the OS, the Endless OS Foundation has identified multiple programs to help underserved communities, and in each case we are adopting our build, validate, advocate strategy. This approach underpins all of our projects: can we build the technology (or assist in the making), will a community in-need validate it by adoption, and can we inspire others by telling the story and advocating for its wider use?
We are adopting a build, validate, advocate strategy.
1. build the technology (or assist in the making)
2. validate by community adoption
3. advocate for its wider use
As examples, we have just launched the Endless Key (link) as an offline solution for students during the COVID-19 at-home distance learning challenges. This project is also establishing a first-ever partnership of well-known online educational brands to reach an underserved offline audience with valuable learning resources. We are developing a pay-as-you-go platform and new partnerships that will allow families to own laptops via micro-payments that are built directly into the operating system, even if they cannot qualify for standard retail financing. And during the pandemic, we ve partnered with Teach For America to focus on very practical digital equity needs in the USA s urban and rural communities. One part of the world-wide digital divide solution We are one solution provider for the complex matrix of issues known collectively as the #DigitalDivide, and these issues will not disappear after the pandemic. Digital equity was an issue long before COVID-19, and we are not so naive to think it can be solved by any single institution, or by the time the pandemic recedes. It will take time and a coalition of partnerships to win. We are in for the long-haul and we are always looking for partners, especially now as we are finding our feet in the nonprofit world. We d love to hear from you, so please feel free to reach out to me I m ramcq on IRC, RocketChat, Twitter, LinkedIn or rob@endlessos.org.

12 January 2021

Russell Coker: PSI and Cgroup2

In the comments on my post about Load Average Monitoring [1] an anonymous person recommended that I investigate PSI. As an aside, why do I get so many great comments anonymously? Don t people want to get credit for having good ideas and learning about new technology before others? PSI is the Pressure Stall Information subsystem for Linux that is included in kernels 4.20 and above, if you want to use it in Debian then you need a kernel from Testing or Unstable (Bullseye has kernel 4.19). The place to start reading about PSI is the main Facebook page about it, it was originally developed at Facebook [2]. I am a little confused by the actual numbers I get out of PSI, while for the load average I can often see where they come from (EG have 2 processes each taking 100% of a core and the load average will be about 2) it s difficult to work out where the PSI numbers come from. For my own use I decided to treat them as unscaled numbers that just indicate problems, higher number is worse and not worry too much about what the number really means. With the cgroup2 interface which is supported by the version of systemd in Testing (and which has been included in Debian backports for Buster) you get PSI files for each cgroup. I ve just uploaded version 1.3.5-2 of etbemon (package mon) to Debian/Unstable which displays the cgroups with PSI numbers greater than 0.5% when the load average test fails.
System CPU Pressure: avg10=0.87 avg60=0.99 avg300=1.00 total=20556310510
/system.slice avg10=0.86 avg60=0.92 avg300=0.97 total=18238772699
/system.slice/system-tor.slice avg10=0.85 avg60=0.69 avg300=0.60 total=11996599996
/system.slice/system-tor.slice/tor@default.service avg10=0.83 avg60=0.69 avg300=0.59 total=5358485146
System IO Pressure: avg10=18.30 avg60=35.85 avg300=42.85 total=310383148314
 full avg10=13.95 avg60=27.72 avg300=33.60 total=216001337513
/system.slice avg10=2.78 avg60=3.86 avg300=5.74 total=51574347007
/system.slice full avg10=1.87 avg60=2.87 avg300=4.36 total=35513103577
/system.slice/mariadb.service avg10=1.33 avg60=3.07 avg300=3.68 total=2559016514
/system.slice/mariadb.service full avg10=1.29 avg60=3.01 avg300=3.61 total=2508485595
/system.slice/matrix-synapse.service avg10=2.74 avg60=3.92 avg300=4.95 total=20466738903
/system.slice/matrix-synapse.service full avg10=2.74 avg60=3.92 avg300=4.95 total=20435187166
Above is an extract from the output of the loadaverage check. It shows that tor is a major user of CPU time (the VM runs a ToR relay node and has close to 100% of one core devoted to that task). It also shows that Mariadb and Matrix are the main users of disk IO. When I installed Matrix the Debian package told me that using SQLite would give lower performance than MySQL, but that didn t seem like a big deal as the server only has a few users. Maybe I should move Matrix to the Mariadb instance. to improve overall system performance. So far I have not written any code to display the memory PSI files. I don t have a lack of RAM on systems I run at the moment and don t have a good test case for this. I welcome patches from people who have the ability to test this and get some benefit from it. We are probably about 6 months away from a new release of Debian and this is probably the last thing I need to do to make etbemon ready for that.

Next.

Previous.